11 research outputs found
Extreme Parkour with Legged Robots
Humans can perform parkour by traversing obstacles in a highly dynamic
fashion requiring precise eye-muscle coordination and movement. Getting robots
to do the same task requires overcoming similar challenges. Classically, this
is done by independently engineering perception, actuation, and control systems
to very low tolerances. This restricts them to tightly controlled settings such
as a predetermined obstacle course in labs. In contrast, humans are able to
learn parkour through practice without significantly changing their underlying
biology. In this paper, we take a similar approach to developing robot parkour
on a small low-cost robot with imprecise actuation and a single front-facing
depth camera for perception which is low-frequency, jittery, and prone to
artifacts. We show how a single neural net policy operating directly from a
camera image, trained in simulation with large-scale RL, can overcome imprecise
sensing and actuation to output highly precise control behavior end-to-end. We
show our robot can perform a high jump on obstacles 2x its height, long jump
across gaps 2x its length, do a handstand and run across tilted ramps, and
generalize to novel obstacle courses with different physical properties.
Parkour videos at https://extreme-parkour.github.io/Comment: Website and videos at https://extreme-parkour.github.io
A quantum-inspired tensor network method for constrained combinatorial optimization problems
Combinatorial optimization is of general interest for both theoretical study
and real-world applications. Fast-developing quantum algorithms provide a
different perspective on solving combinatorial optimization problems. In this
paper, we propose a quantum inspired algorithm for general locally constrained
combinatorial optimization problems by encoding the constraints directly into a
tensor network state. The optimal solution can be efficiently solved by
borrowing the imaginary time evolution from a quantum many-body system. We
demonstrate our algorithm with the open-pit mining problem numerically. Our
computational results show the effectiveness of this construction and potential
applications in further studies for general combinatorial optimization
problems
A Dynamic Graph Interactive Framework with Label-Semantic Injection for Spoken Language Understanding
Multi-intent detection and slot filling joint models are gaining increasing
traction since they are closer to complicated real-world scenarios. However,
existing approaches (1) focus on identifying implicit correlations between
utterances and one-hot encoded labels in both tasks while ignoring explicit
label characteristics; (2) directly incorporate multi-intent information for
each token, which could lead to incorrect slot prediction due to the
introduction of irrelevant intent. In this paper, we propose a framework termed
DGIF, which first leverages the semantic information of labels to give the
model additional signals and enriched priors. Then, a multi-grain interactive
graph is constructed to model correlations between intents and slots.
Specifically, we propose a novel approach to construct the interactive graph
based on the injection of label semantics, which can automatically update the
graph to better alleviate error propagation. Experimental results show that our
framework significantly outperforms existing approaches, obtaining a relative
improvement of 13.7% over the previous best model on the MixATIS dataset in
overall accuracy.Comment: Submitted to ICASSP 202
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
The recent video grounding works attempt to introduce vanilla contrastive
learning into video grounding. However, we claim that this naive solution is
suboptimal. Contrastive learning requires two key properties: (1)
\emph{alignment} of features of similar samples, and (2) \emph{uniformity} of
the induced distribution of the normalized features on the hypersphere. Due to
two annoying issues in video grounding: (1) the co-existence of some visual
entities in both ground truth and other moments, \ie semantic overlapping; (2)
only a few moments in the video are annotated, \ie sparse annotation dilemma,
vanilla contrastive learning is unable to model the correlations between
temporally distant moments and learned inconsistent video representations. Both
characteristics lead to vanilla contrastive learning being unsuitable for video
grounding. In this paper, we introduce Geodesic and Game Localization (G2L), a
semantically aligned and uniform video grounding framework via geodesic and
game theory. We quantify the correlations among moments leveraging the geodesic
distance that guides the model to learn the correct cross-modal
representations. Furthermore, from the novel perspective of game theory, we
propose semantic Shapley interaction based on geodesic distance sampling to
learn fine-grained semantic alignment in similar moments. Experiments on three
benchmarks demonstrate the effectiveness of our method.Comment: ICCV202
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Automatic radiology report generation has attracted enormous research
interest due to its practical value in reducing the workload of radiologists.
However, simultaneously establishing global correspondences between the image
(e.g., Chest X-ray) and its related report and local alignments between image
patches and keywords remains challenging. To this end, we propose an Unify,
Align and then Refine (UAR) approach to learn multi-level cross-modal
alignments and introduce three novel modules: Latent Space Unifier (LSU),
Cross-modal Representation Aligner (CRA) and Text-to-Image Refiner (TIR).
Specifically, LSU unifies multimodal data into discrete tokens, making it
flexible to learn common knowledge among modalities with a shared network. The
modality-agnostic CRA learns discriminative features via a set of orthonormal
basis and a dual-gate mechanism first and then globally aligns visual and
textual representations under a triplet contrastive loss. TIR boosts
token-level local alignment via calibrating text-to-image attention with a
learnable mask. Additionally, we design a two-stage training procedure to make
UAR gradually grasp cross-modal alignments at different levels, which imitates
radiologists' workflow: writing sentence by sentence first and then checking
word by word. Extensive experiments and analyses on IU-Xray and MIMIC-CXR
benchmark datasets demonstrate the superiority of our UAR against varied
state-of-the-art methods.Comment: 8 pages,6 figures,4 table
Exploiting Prompt Caption for Video Grounding
Video grounding aims to locate a moment of interest matching the given query
sentence from an untrimmed video. Previous works ignore the \emph{sparsity
dilemma} in video annotations, which fails to provide the context information
between potential events and query sentences in the dataset. In this paper, we
contend that exploiting easily available captions which describe general
actions \ie, prompt captions (PC) defined in our paper, will significantly
boost the performance. To this end, we propose a Prompt Caption Network (PCNet)
for video grounding. Specifically, we first introduce dense video captioning to
generate dense captions and then obtain prompt captions by Non-Prompt Caption
Suppression (NPCS). To capture the potential information in prompt captions, we
propose Caption Guided Attention (CGA) project the semantic relations between
prompt captions and query sentences into temporal space and fuse them into
visual representations. Considering the gap between prompt captions and ground
truth, we propose Asymmetric Cross-modal Contrastive Learning (ACCL) for
constructing more negative pairs to maximize cross-modal mutual information.
Without bells and whistles, extensive experiments on three public datasets
(\ie, ActivityNet Captions, TACoS and ActivityNet-CG) demonstrate that our
method significantly outperforms state-of-the-art methods
Vitexin attenuates smoke inhalation induced acute lung injury in rats by inhibiting oxidative stress via PKC β/p66Shc signaling pathway
Purpose: To investigate the protective effect of vitexin on smoke inhalation-induced acute lung injury (SI-ALI), and the underlying mechanism of action.Methods: The ALI rat model was established by inhalation of smoke in a closed smoke chamber. Survival rate, arterial blood gas analysis, wet-to-dry weight ratio of lung tissues, bronchoalveolar lavage fluid protein concentration, lung tissue histology, and oxidative stress and inflammation level were evaluated. Expressions of protein kinase C β (PKC β), p66Shc, and phosphorylated p66Shc were determined by western blot or quantitative reverse transcription-polymerase chain reaction.Results: Compared with smoke inhalation group, vitexin alleviated the decline in arterial partial pressure of oxygen (p < 0.05), reduced lung tissue exudation and pathological lung tissue damage, inhibited the expression of PKC β/p66Shc signaling pathway proteins, downregulated the level of oxidative stress and inflammation, and ultimately improved the survival rate in SI-ALI rats (p < 0.05).Conclusion: Vitexin attenuates SI-ALI in rats by alleviating oxidative stress via inhibition of PKC β/p66Shc signaling pathway. Thus, this compound is a potential agent for the treatment of SI-ALI
Metagenomic sequencing for identifying pathogen-specific circulating DNAs and development of diagnostic methods for schistosomiasis
Summary: Timely diagnosis of Schistosoma infection, particularly in the early stage is crucial for identifying infected hosts and then taking effective control strategies. Here, metagenomic next-generation sequencing was used to identify pathogen-specific circulating DNAs (cDNAs) in the sera/plasma of New Zealand rabbits infected with S. japonicum, and the identified cDNAs were validated by PCR and qPCR. Loop-mediated isothermal amplification (LAMP)-based CRISPR-Cas12a and recombinase polymerase amplification-based lateral flow strip (RPA-LF) methods combined with the newly identified cDNA were developed to evaluate the potentials for diagnosing murine and human schistosomiasis. The results indicated that twenty-two cDNAs were identified. The developed LAMP-based CRISPR/Cas12a and RPA-LF methods showed a good potential for diagnosing murine or human schistosomiasis as early as 5 days of post-infection with 5 cercariae infection. In a word, S. japonicum specific cDNAs in circulation of infected hosts could be effective biomarkers for detecting Schistosoma infection particularly for early stages